配备高速数字化器的前端电子设备正在使用并建议将来的核检测器。最近的文献表明,在处理来自核检测器的数字信号时,深度学习模型,尤其是一维卷积神经网络。模拟和实验证明了该领域神经网络的令人满意的准确性和其他好处。但是,仍需要研究特定的硬件加速在线操作。在这项工作中,我们介绍了Pulsedl-II,这是一种专门设计的,专门为事件功能(时间,能量等)从具有深度学习的脉冲中提取的应用。根据先前的版本,PULSEDL-II将RISC CPU纳入系统结构,以更好地功能灵活性和完整性。 SOC中的神经网络加速器采用三级(算术单元,处理元件,神经网络)层次结构,并促进数字设计的参数优化。此外,我们设计了一种量化方案和相关的实现方法(恢复和位移位),以在所选层类型的选定子集中与深度学习框架(例如Tensorflow)完全兼容。通过当前方案,支持神经网络的量化训练,并通过专用脚本自动将网络模型转换为RISC CPU软件,几乎没有准确性损失。我们在现场可编程门阵列(FPGA)上验证pulsedl-ii。最后,通过由直接数字合成(DDS)信号发生器和带有模数转换器(ADC)的FPGA开发板组成的实验设置进行系统验证。拟议的系统实现了60 PS的时间分辨率和0.40%的能量分辨率,在线神经网络推断在信号与噪声比(SNR)为47.4 dB时。
translated by 谷歌翻译
语言指导的体现了AI基准,要求代理导航环境并操纵对象通常允许单向通信:人类用户向代理提供了自然语言命令,而代理只能被动地遵循命令。我们介绍了基于Alfred基准测试的基准测试后的拨号式拨号。Dialfred允许代理商积极向人类用户提出问题;代理使用用户响应中的其他信息来更好地完成其任务。我们发布了一个具有53K任务的问题和答案的人类注销数据集,以及一个可以回答问题的甲骨文。为了解决Dialfred,我们提出了一个提问者绩效框架,其中发问者通过人类通知的数据进行了预训练,并通过增强学习进行了微调。我们将拨号拨入公开,并鼓励研究人员提出和评估他们的解决方案,以构建支持对话的体现代理。
translated by 谷歌翻译
多语种模型是参数效率,特别是通过利用Crosslingual Transcer来改善低资源语言。尽管最近有巨大的多语言翻译预先推出了越来越多的模型和数据,但如何有效地培养多语言模型并未得到很好的理解。在本文中,我们表明,多语言训练中的常见情况,语言之间的数据不平衡,高资源和低资源语言之间的优化张力,其中发现的多语言解决方案通常是低资源的次优。我们展示了普通培训方法,upsamples低资源无法鲁布利地优化人口损失,其中风险耗材或过度为低资源的风险。绘制最近关于损失景观几何学的发现及其对泛化的影响,提出了一个原则性的优化算法,曲率意识的任务缩放(CAT),其自适应地从不同任务中重新加强了对低曲率的多语言训练的元的梯度。邻居对所有语言均匀低损失。我们在共同基准(TED,WMT和OPUS-100)上进行了实验,具有不同程度的数据不平衡。猫有效地改善了多语言优化,结果表明,在低资源($ + 0.8 $至+ 2.2 $ BLEU)上展示了一致的收益,而不会伤害高资源。此外,猫对过度分数计量和大量批量训练具有强大的稳健性,这使得这是一种充满希望的大量多语言模型,真正提高低资源语言。
translated by 谷歌翻译
Brain midline shift (MLS) is one of the most critical factors to be considered for clinical diagnosis and treatment decision-making for intracranial hemorrhage. Existing computational methods on MLS quantification not only require intensive labeling in millimeter-level measurement but also suffer from poor performance due to their dependence on specific landmarks or simplified anatomical assumptions. In this paper, we propose a novel semi-supervised framework to accurately measure the scale of MLS from head CT scans. We formulate the MLS measurement task as a deformation estimation problem and solve it using a few MLS slices with sparse labels. Meanwhile, with the help of diffusion models, we are able to use a great number of unlabeled MLS data and 2793 non-MLS cases for representation learning and regularization. The extracted representation reflects how the image is different from a non-MLS image and regularization serves an important role in the sparse-to-dense refinement of the deformation field. Our experiment on a real clinical brain hemorrhage dataset has achieved state-of-the-art performance and can generate interpretable deformation fields.
translated by 谷歌翻译
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not. This paradigm, although effective, is inefficient because the detectors have to go through all patches, severely hindering the inference speed. This paper presents an Objectness Activation Network (OAN) to help detectors focus on fewer patches but achieve more efficient inference and more accurate results, enabling a simple and effective solution to object detection in large images. In brief, OAN is a light fully-convolutional network for judging whether each patch contains objects or not, which can be easily integrated into many object detectors and jointly trained with them end-to-end. We extensively evaluate our OAN with five advanced detectors. Using OAN, all five detectors acquire more than 30.0% speed-up on three large-scale aerial image datasets, meanwhile with consistent accuracy improvements. On extremely large Gaofen-2 images (29200$\times$27620 pixels), our OAN improves the detection speed by 70.5%. Moreover, we extend our OAN to driving-scene object detection and 4K video object detection, boosting the detection speed by 112.1% and 75.0%, respectively, without sacrificing the accuracy. Code is available at https://github.com/Ranchosky/OAN.
translated by 谷歌翻译
We study the problem of semantic segmentation calibration. For image classification, lots of existing solutions are proposed to alleviate model miscalibration of confidence. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
translated by 谷歌翻译
In this paper, we propose a large-scale language pre-training for text GENeration using dIffusion modEl, which is named GENIE. GENIE is a pre-training sequence-to-sequence text generation model which combines Transformer and diffusion. The diffusion model accepts the latent information from the encoder, which is used to guide the denoising of the current time step. After multiple such denoise iterations, the diffusion model can restore the Gaussian noise to the diverse output text which is controlled by the input text. Moreover, such architecture design also allows us to adopt large scale pre-training on the GENIE. We propose a novel pre-training method named continuous paragraph denoise based on the characteristics of the diffusion model. Extensive experiments on the XSum, CNN/DailyMail, and Gigaword benchmarks shows that GENIE can achieves comparable performance with various strong baselines, especially after pre-training, the generation quality of GENIE is greatly improved. We have also conduct a lot of experiments on the generation diversity and parameter impact of GENIE. The code for GENIE will be made publicly available.
translated by 谷歌翻译
Developing autonomous vehicles (AVs) helps improve the road safety and traffic efficiency of intelligent transportation systems (ITS). Accurately predicting the trajectories of traffic participants is essential to the decision-making and motion planning of AVs in interactive scenarios. Recently, learning-based trajectory predictors have shown state-of-the-art performance in highway or urban areas. However, most existing learning-based models trained with fixed datasets may perform poorly in continuously changing scenarios. Specifically, they may not perform well in learned scenarios after learning the new one. This phenomenon is called "catastrophic forgetting". Few studies investigate trajectory predictions in continuous scenarios, where catastrophic forgetting may happen. To handle this problem, first, a novel continual learning (CL) approach for vehicle trajectory prediction is proposed in this paper. Then, inspired by brain science, a dynamic memory mechanism is developed by utilizing the measurement of traffic divergence between scenarios, which balances the performance and training efficiency of the proposed CL approach. Finally, datasets collected from different locations are used to design continual training and testing methods in experiments. Experimental results show that the proposed approach achieves consistently high prediction accuracy in continuous scenarios without re-training, which mitigates catastrophic forgetting compared to non-CL approaches. The implementation of the proposed approach is publicly available at https://github.com/BIT-Jack/D-GSM
translated by 谷歌翻译
Data compression is becoming critical for storing scientific data because many scientific applications need to store large amounts of data and post process this data for scientific discovery. Unlike image and video compression algorithms that limit errors to primary data, scientists require compression techniques that accurately preserve derived quantities of interest (QoIs). This paper presents a physics-informed compression technique implemented as an end-to-end, scalable, GPU-based pipeline for data compression that addresses this requirement. Our hybrid compression technique combines machine learning techniques and standard compression methods. Specifically, we combine an autoencoder, an error-bounded lossy compressor to provide guarantees on raw data error, and a constraint satisfaction post-processing step to preserve the QoIs within a minimal error (generally less than floating point error). The effectiveness of the data compression pipeline is demonstrated by compressing nuclear fusion simulation data generated by a large-scale fusion code, XGC, which produces hundreds of terabytes of data in a single day. Our approach works within the ADIOS framework and results in compression by a factor of more than 150 while requiring only a few percent of the computational resources necessary for generating the data, making the overall approach highly effective for practical scenarios.
translated by 谷歌翻译
Inspired by the recent success of Transformers for Natural Language Processing and vision Transformer for Computer Vision, many researchers in the medical imaging community have flocked to Transformer-based networks for various main stream medical tasks such as classification, segmentation, and estimation. In this study, we analyze, two recently published Transformer-based network architectures for the task of multimodal head-and-tumor segmentation and compare their performance to the de facto standard 3D segmentation network - the nnU-Net. Our results showed that modeling long-range dependencies may be helpful in cases where large structures are present and/or large field of view is needed. However, for small structures such as head-and-neck tumor, the convolution-based U-Net architecture seemed to perform well, especially when training dataset is small and computational resource is limited.
translated by 谷歌翻译